The raw data for this project was downloaded from the GOV.UK website.
More information about the variables can be found in the codebook.txt file in the data folder.
Data is available from this link:
https://www.gov.uk/government/publications/deaths-associated-with-neurological-conditions
The URL for these pages is:
https://evie-lois.github.io/deaths-assoc-neuro/
The repository for these pages is:
Which neurological condition(s) has the highest rate of associated deaths from 2001 to 2014?
The visualisation will attempt to show the number of deaths associated with neurological conditions between the years 2001 to 2014, to visually represent the neurological condition with the highest rate of associated deaths.
In order to have my data ready for visualisation I needed to clean and reshape the data. As the dataset I had contained a table with two tables embedded I had to make these into one complete dataset. In the process, I removed outliers and repeated columns in order to prepare the data. The columns were also renamed into codes so they would be easier to work with.
#Split embedded table into two
first_table <- data[1:16, ]
second_table <-data[17:nrow(data), ]
# Structure tables, exclude outliers
first_table <- first_table[-c(16), ]
second_table <- second_table[-c(16:31),]
# Swap the first row of each table to become the column names
colnames(first_table) <- first_table[1, ]
first_table <- first_table[-1, ]
colnames(second_table) <-second_table[1, ]
second_table <- second_table[-1, ]
# Clean data, removing outliers
data1_cleaned <- first_table[, -c(3,5,7,9,11,13,15,17,19)]
data1_cleaned <- data1_cleaned[, -c(2,11)]
data2_cleaned <- second_table[, -c(3,5,7,9,11,13,15,17,19)]
data2_cleaned <- data2_cleaned[, -c(8,10)]
# Combine tables
combined_table <- bind_cols(data1_cleaned, data2_cleaned)
## New names:
## • `Year of the registration of death` -> `Year of the registration of
## death...1`
## • `Year of the registration of death` -> `Year of the registration of
## death...10`
# Clean new dataset
cleaned_dataset <- combined_table[,-c(10)]
#Code columns
cols <- c("year", "dwnc", "epi", "mndsma", "msid", "nmd", "poed", "tbsi", "totns", "at", "cnsi", "cnd", "dd", "fd", "ham", "raond", "smar")
colnames(cleaned_dataset) <- cols
#Reshape the data
reshaped_dataset <- cleaned_dataset %>%
pivot_longer(cols = -year,
names_to = "Condition",
values_to = "Deaths")
#Check for missing values in the dataset
sum(is.na(reshaped_dataset))
## [1] 0
#Check structure and data types
str(reshaped_dataset)
## tibble [224 × 3] (S3: tbl_df/tbl/data.frame)
## $ year : chr [1:224] "2001" "2001" "2001" "2001" ...
## $ Condition: chr [1:224] "dwnc" "epi" "mndsma" "msid" ...
## $ Deaths : chr [1:224] "23051" "1936" "1574" "1301" ...
#Check number of rows and columns
dim(reshaped_dataset)
## [1] 224 3
#Show cleaned and reshaped dataset
head(reshaped_dataset)
## # A tibble: 6 × 3
## year Condition Deaths
## <chr> <chr> <chr>
## 1 2001 dwnc 23051
## 2 2001 epi 1936
## 3 2001 mndsma 1574
## 4 2001 msid 1301
## 5 2001 nmd 563
## 6 2001 poed 6963
Once the data is prepared, visualisation can begin.
I created my visualisation using ggplot and plotly. For my visualisation I wanted to represent the data so it would be clear as to which neurological condition had the highest rate of associated deaths. I chose a horizontal view as the data presented better this way. I included the use of ggplotly to allow for interaction of the data to view in more detail.
# Checking 'Deaths' column is numeric in order to plot data
str(reshaped_dataset$Deaths)
## chr [1:224] "23051" "1936" "1574" "1301" "563" "6963" "1773" "4995" "80" ...
reshaped_dataset$Deaths <- as.numeric(as.character(reshaped_dataset$Deaths))
# Creating the scatter plot
p <- ggplot(reshaped_dataset, aes(x = year, y = Deaths, color = Condition)) +
geom_point(size = 3) + #adjusting plot point size
labs(title = "Deaths associated with Neurological Conditions", subtitle = "Between the Years 2001 to 2014",
x = "Year",
y = "Number of Deaths", caption = "Source: GOV.UK") + #labeling the x and y axis and citing data source
theme_minimal() + #minimal for a clean look
theme(plot.title = element_text(size = 16, face = "bold"),
#format title size and boldness
plot.subtitle = element_text(hjust = 0.5, size = 12, face = "italic"),
#format subtitle size, potition and italics
legend.title = element_text(size = 11, face = "bold"),
#format legend title size and boldness
axis.title = element_text(size = 14),
#format axis title size
axis.text = element_text(size = 12),
#format axis text size
plot.background = element_rect(fill = "snow")) +
#fill background colour
#Colour code and rename labels
scale_y_log10(labels = scales::comma) +
#Colour coding conditions, making sure they will be visable on the graph and not clash with other colours
scale_color_manual(values = c( "dwnc" = "red", "epi" = "green", "mndsma" = "lightcoral", "msid" = "pink", "nmd"= "purple", "poed" = "orange", "tbsi" = "magenta", "totns" = "cyan", "at" = "blue", "cnsi" = "dodgerblue", "cnd" = "darkgreen", "dd" = "lightblue", "fd" = "violet", "ham" = "gold", "raond"= "darkblue", "smar" = "darkgray"),
#Renaming neurological conditions
labels = c( "dwnc" = "Deaths with Mention of Neurological Condition", "epi" = "Epliepsy", "mndsma" = "Motor Neurone Disease and Spinal Muscular Atrophy", "msid" = "Multiple Sclerosis and Inflammatory Disorders", "nmd" = "Neuromuscular Diseases", "poed" = "Parkinsonism and Other Etrapyrimidal Disorders", "tbsi" = "Traumatic brain and Spinal Injury", "totns" = "Tumours of the Nervous System", "at"= "Ataxia", "cnsi" = "CNS Infections", "cnd" = "Cranial Nerve Damage", "dd" = "Development Disorders", "fd" = "Functional Disorders", "ham" = "Headaches and Migraines", "raond" = "Rare and Other Neurological Diseases", "smar" = "Spondylotic Myelophathy and Radiculopathy")) +
# Ammend plot
coord_flip() + #flip plot to horizontal
theme(axis.line = element_line(color = "black")) #adding black axis lines
print(p) #print the ggplot
ggplotly(p) #interactive plot
The final output is saved in the figs folder
#Saving the plot
ggsave(here("figs","neurodeathsplot.png"))
## Saving 7 x 5 in image
Interpretation
The visualisation shows the death rates associated with neurological conditions. The neurological conditions with the highest rate from 2001-2014 is Parkinsonism and Other Etrapyrimidal Disorders.This is an interesting analysis as disorders within this category such as Parkinson’s disease affect a large number of individuals within the UK and achknowledging the number of deaths associated with the disorder could help to show the public the importance of donating to charities such as Parkinson’s UK to make a difference for people living with Parkinson’s disease. Furthmore, this visualisation creates a comparison for the different neurological conditions and how they were associated with a number of deaths. A limitation of this is that the data is from the years 2001-2014 so an updated dataset would be useful to compare the numbers of deaths within the last ten years.
Follow-ups
To investigate further, I would include the age of participants and run a regression analysis to test if there is a correlation between neurological disorder deaths and old age. I would also follow up this analysis with an up to date dataset to compare the results to see if there are any changes to the neuroloigcal conditions with the highest rate of associated deaths.
Conclusion
From completing this module and code project, I have learned how to use R efficiently after having no previous knowledge of the software. I have also been able to understand my mistakes and learn how to correct them within R. If I was to do this project again with more time and data accessibility, I would have followed through with my previous choice of public health data looking into a comparison of the Western and Mediterranean diet. By comparing the effects on life expectancy and disease. I would have chosen data from individuals that follow either diet and compare their health to see if there is a correlation of disease and shorter life expectancy when following the Western diet in order to understand the effects of what we consume on our health and present this data to show others that the cheap and convenient lifestyle of the Western diet can have a large impact on our lives.